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Approximately  Integrable  Linear  Statistical  Models 
in  Non-Parametric  Estimation 

B.  Ya.  Levit 
Summary  1 

The  notion  of  approximately  integrable  linear  statistical  models  is  introduced  to  an¬ 
alyze  the  higher  order  optimality  properties  of  some  common  nonparametric  estimators. 
The  approximately  integrable  models  suggest  a  useful  approach  to  a  unified  treatment  of 
both  regular  and  irregular  non-parametric  problems.  It  is  shown  that  wjith  such  models 
any  rate  of  improvement  ranging  from  (log  n)** /n2  to  l/(n(log . .  .log  n)01),  a  >  0,  of  the 
classical  non-parametric  procedures  can  be  anticipated.  Both  an  example  of  a  first  order 
asymptotically  optimal  estimator  with  the  unusual  rate  n”1  log  n  and  an  estimator  with 
an  extremely  slow  unimprovable  rate  of  convergence  l/(log . . .  log  n)a  are  presented. 


f--— ■■■  - ' 

j  fills  Cr-.A&I 

OTIC  TAB 

U:iarr»Oi:,iCt-d 

JuSllJ'C'.itiO". 

By _ 

Dist'i!)  "  1 

i 

i 

U  j 

_ _ _ 

A 

ty  Codes 

Dir.t 

A  \J:I 
Sr-:* 

:  d  i  or 
rial 

_ 

The  research  was  supported  by  the  Department  of  Statistics,  Purdue  University  under 
grants  NSF  DMS-8923071  and  ONR  Contract  N00014-89-K-0170. 


1 


Approximately  Integrable  Linear  Statistical  Models 
in  Non-Parametric  Estimation 

B.  Ya.  Levit 
1.  Introduction 

The  aim  of  the  present  report  is  to  develop  the  notion  of  approximately  integrable 
linear  (a.i.l.)  statistical  models  related  to  the  study  of  the  “next”  order  optimality  in  non- 
parametric  estimation.  It  appears  consistent  to  keep  the  exposition  at  present  at  the  least 
technical  level  restricted  so  far  to  quadratic  losses  and  scalar  valued  functionals.  At  the 
same  time,  the  reader  will  probably  notice  a  number  of  generalizations  readily  suggesting 
themselves,  some  of  these  to  be  reported  elsewhere. 

A  useful  lower  bound  for  a  local  minimax  risk  in  estimating  such  functionals  will  be 
derived  (Section  3).  Based  on  this  bound  it  will  be  demonstrated  that  with  a.i.l.  models  any 
rate  of  the  “next”  order  improvement  of  the  (first  order)  asymptotically  optimal  estimators 
may  be  anticipated  ranging  from  (log  n)“/n2  to  l/(log . . .  log  n)an,  for  k  =  1, 2, . . . ,  a  >  0. 

- - v - ' 

k 

Clearly  when  this  is  the  case  the  next  order  improvement  may  well  challenge  the 
asymptotic  optimality  of  a  given  first  order  efficient  estimator.  At  the  same  time,  with  the 
a.i.l.  models  one  easily  discloses  nonparametric  problems  with  first  order  efficient  estimators 
converging  at  rates  (e.g.  n-1  log  n)  different  from  the  common  one  (1/n). 

Another  highlighting  point  is  that  the  a.i.l.  models  appear  to  be  rather  well  tailored  to 
incorporate  both  regular  (as  e.g.  cdf  estimation)  and  irregular  problems  (such  as  estimation 
of  the  derivatives  of  cdf).  Both  types  can  be  treated  then,  along  similar  lines  using  the 
above  mentioned  lower  bound.  With  this  approach  one  discovers  a  close  relation  between 
the  optimal  rates  of  improving  the  standard  estimators  of  the  regular  functionals  and  the 
optimal  estimability  rates  for  the  irregular  ones. 

We  introduce  a.i.l.  models  after  presenting  some  prerequisites. 

2.  Some  Preliminaries  and  Definitions 

Let  Xi , . . . ,  Xn  be  an  independent  sample  in  a  measurable  space  (X,  A)  with  a  com¬ 
mon  distribution  F  ranging  in  a  given  subset  T  of  distributions  defined  on  A.  It  will  prove 
convenient  to  supply  T  with  a  relevant  topology  T.  While  different  competing  measures  of 
closeness  on  T  are  readily  available,  at  this  stage  it  appears  difficult  to  argue  conclusively 
in  favor  of  anv  particular  one.  Still  mainly  for  its  clear  statistical  meaning  we  will  make 
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use  in  the  sequel  of  the  topology  T  =  T\  on  T  induced  by  the  distance  in  variation  just  to 
fix  a  workable  and  relatively  simple  one. 

Given  a  real  valued  function  ’i'(F),  F  €  F,  we  address  below  optimal  rates  of  estima- 
bility  and,  provided  first  order  efficient  estimators  exist,  higher  order  optimality  properties 
in  estimating  the  unknown  value  'i '(F)  based  on  given  observations. 

Let  'i>n  =  . . .  ,X„)  be  an  arbitrary  estimator  of  'I'(F)  and 

J?n(*n,F)  =  EF(*n  -  \P(F))2.  (2.1) 

While  there  are  plenty  of  loss  functions  one  can  choose  from,  the  particular  one  in  (2.1) 
serves  well  the  purposes  of  this  presentation.  By  an  estimator  we  mean  below  any 
sequence  of  estimators  ^n,  n  >  1. 

Let  us  recall  next  some  asymptotic  properties  a  reasonable  estimator  of  'I'(F)  is 
expected  to  share.  The  underlying  common  idea  behind  the  different  definitions  to  be  used 
below  is  that  a  “nice”  estimator  should  exhibit  reasonable  global  consistency  properties 
while  being  locally  unimprovable.  To  this  point  we  present  the  following  definitions  keeping 
in  mind  their  reference  to  a  given  underlying  set  of  distributions  T. 

Definition  1.  The  function  $(F)  is  called 

a)  />(n)-rate  estimable  if  there  exists  an  estimator  such  that,  locally  uniformly  in  F, 

F„(*n,F)  =  0(p(n)),  (n  — ►  oo); 

b)  exactly  p(n)-rate  estimable  if  it  is  p(n)~ rate  estimable  and  moreover  for  any  sequence 
p'(n),  p'(n)/p(n)  — ►  0,  and  any  non-empty  vicinity  V  €  T  no  estimator  '!>„  satisfies  the 
relation 

J?w(*n,F)==0(//(n)) 

uniformly  on  V. 

Assume  that  'I'(F)  is  exactly  p(n)-rate  estimable.  The  next  definition  refers  to  the 
first  order  asymptotically  optimal  properties  in  estimating  'I'(F). 

Definition  2.  An  estimator  ’i'n  is  called  locally  asymptotically  unimprovable  or  first 
order  asymptotically  optimal  if  for  any  non-empty  vicinity  V  and  a  positive  number  R 
there  exists  no  such  that  for  n  >  no  no  estimator  ^'r  satisfies  the  inequality 

Rn(Ki  F)  <  * n(*n,  F)  -  RP(n),  F  €  V. 
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Let  now  $nbea  first  order  asymptotically  optimal  estimator.  The  following  definition 
refers  to  the  “next”  order  properties  o{  tyn. 

Definition  3.  The  estimator  is  called 

a)  pi(n)-rate  improvable  if  there  exists  a  non-empty  vicinity  V ,  positive  number  R  and  an 
estimator  such  that  locally  uniformly  in  F 

Rn(K*F)  <  Rn(*n,F)  -  RPl(n)l(F  €  V)  +  o(p\ (n)). 

Otherwise  is  called  /?i(n)-rate  unimprovable  on  T  (here  pi(n)/ p(n)  =  o(l),  n  —*  oo); 

b)  exactly  pj(n)-rate  improvable  if  it  is  pi(n)-rate  improvable  and  moreover  for  any  non¬ 
empty  vicinity  V  and  any  sequence  p\(n),  p'i(n)/pi(n )  ->  oo,  n  ->  oo,  is  p[(n)- rate 
unimprovable  on  V. 

Let  ip:  x  — ♦  R1  be  a  measurable  function.  It  appears  the  linear  functionals  of  the  form 

*(F)=  f<Kx)dF(x)  (2.2) 

provide  useful  approximations  to  a  variety  of  meaningful  nonparametric  functionals  both 
regular  and  irregular. 

Let 

*„  =  I 

1=1 

Conditions  for  asymptotic  optimality  of  the  estimator  were  found  in  Levit  (1974);  see 
also  Koshevnik  and  Levit  (1976).  We  summarize  below  the  corresponding  result  for  the 
sake  of  reference. 

Theorem  2.1.  Assume  the  set  T  satisfies  the  following  conditions: 

1)  J  \p2(x)dF  is  locally  bounded  in  F, 

2)  for  any  F  €  F  there  exist  a  sequence  of  functions  rpk(x)  and  positive  numbers  a* 
such  that  T  contains,  for  any  k ,  the  exponential  family  of  distributions  Gc  defined  by  the 
relation 

=  exp{c^fc(x)  -  6(c)}»  W  <  Qk,  (2-3) 

and 

iim  / (rpk(x)  -  ip(x))2dF  =  0.  (2.4) 

k— » oo  J 
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Then  4>n  is  first  order  asymptotically  optimal  estimator  of  'J'(F). 

Conditions  (2.2)-(2.4)  represent  corresponding  linearity  and  integrability  properties 
of  the  functional  'I'(F);  the  full  meaning  of  the  later  term  to  be  explained  in  subsequent 
publications.  We  elaborate  further  on  approximately  linear  integrable  models  in  the  next 
section. 

3.  Approximately  Integrable  Linear  Models: 

Lower  Bounds 

Let  'F(F)  be  a  given  functional  to  be  estimated  from  the  sample  Xi,...,X„  and 
x  €  X  —  a  sequence  of  real  valued  measurable  functions.  Denote 

Bn,F  =  EfMX)  -  *(E) 

°l,F  =  VaxF^n(X)  (3.0) 

where  X  is  distributed  according  to  F. 

In  the  particular  case  of  the  functional  (2.2)  with 

< Tp  =  VaiFip(X)  <  oo 

denote  also 

A„,f  =  crj.  -  a\  F  (3.0') 

Approximately  integrable  linear  (a.i.l.)  models  to  be  considered  below  can  be  de¬ 
scribed  by  the  following  two  assumptions. 

Assumption  AL  (approximate  linearity).  Locally  uniformly  in  F 

«_1<7n,F  +  bI,f  =  °(1),  n-+oo. 

Assumption  AI  (approximate  integrability).  For  every  F  €  T  and  any  of  its  vicinities 
V  there  exists  positive  a„  such  that  the  exponential  family  of  distributions  Gn  c  defined 
by  the  relations 

dGn,c  =  g(x,c)dF  =  exp{ci£n(x)  -  bn(c)}dF,  (3.1) 

6„(c)  =  log  J  exip{crpn(x)}dF  (3.1') 


exists  and  belongs  to  V  for  |c|  <  a„- 
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Due  to  the  assumption  AL, 


*n  =  -y>„(*,) 

n  J 


n  r- 
1=1 


is  a  consistent  estimator  of  'I'(F)  while  a2n  F  and  Bn<F  provide  an  upper  bound  in  estimating 
'J'(F)  as 

Rn(*n,F)  =  n-1<72ntF  +  BlF-  (3-1") 

It  is  to  be  shown  next  that  they  provide  a  useful  lower  bound  in  estimating  'I'(F)  as  well. 
Whenever  the  family  (3.1)  is  involved  we  will  denote 


A„,f(c)  =  A„,g„ 


(3.2) 


etc. 

Let  <f>(a)  denote  the  class  of  continuously  differentiable  probability  densities  A  vanish¬ 
ing  outside  the  interval  (—a,  a)  with 


/(A)  =  / 


(A'(c))2 

A(c) 


dc  <  oo. 


Theorem  3.1.  Assume  that  the  family  of  distributions  defined  by  (3.1)  satisfies  assump¬ 
tion  AI  w.r.t.  a  vicinity  V  of  a  given  Fo  €  T  and  An(-)  E  <f>(an).  Then  the  following 
inequalities  obtain 


On  2 

inf  sup(i2„(^„,F)  —  Rn(^n,F))  >  —  f  -  Bn,Fo(c))  A n(c)dc  (3.3) 

*n  F£V  J  VnMc)  J 


-a* 

On 


-2W 


inf  sup (Rn(*n,F)  -  >  f  (in'1  \'n(c)Bn,Fo(c)  ~  ^-)dc  (3.4) 

F€V  J  An{C) 

-On 

and  in  the  case  of  the  functional  (2.2)  with  locally  bounded  crF  =  Va iFip(X) 

a  n 

inf  sup(Fn(*n,F)  -  n~l<j2F)  >  f  (-n_1  A„,f0(c)+ 

Fev  J 

—  On 

+  2n-1V.(c)B.,F.(c)  -  "'2^4tT -)*•  <3'5) 

An(c) 
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In  applications  below  we  will  choose  x pn  appearing  in  Theorem  3.1  so  as  to  bring 
the  upper  bound  (3.1")  and  lower  bounds  (3.3)-(3.5)  as  close  as  possible  in  their  rates  of 
decrease.  It  seems  tempting  to  optimize  these  lower  bounds  by  a  particular  choice  of  An(  ); 
a  task  which  however  we  won’t  pursue  here. 

Proof.  Denote  8(c)  =  $>(Gn,c),  <x2(c)  =  ^n.Fo^)’  =  Bn,F0(c)i  6(c)  =  6n(c)  and 

gM(x,c)  =  {]  g(xi,c)  =  exp{n(c^„  -  6(c))}.  By  (3.T) 

1=1 

8(c)  =  6'(c)  -  B(c).  (3.6) 

Hence 

4-  log  g(n)(x ,  c)  =  n(4'„  -  6'(c))  =  n(*n  -  8(c)  -  B(c)).  (3.7) 

dc 

Consider  the  Bayes  estimator  ’I'a  of  'if(F)  w.r.t.  risk  function  f?„(^„,  F)  and  the  prior 
distribution  induced  on  the  subfamily  Gn<c  €  V,  |c|  <  an,  by  the  A „(c): 

/  g(n\x,c)8(c)\n(c)dc 

=  Z±r: -  =  +  Mn(iAn) 

f  g(n)(x,c)\n(c)dc 

-Of. 

where 

/  5(n)U>C)(^(C)  -  0n)A n(c)dc 

/in(i)  =  - • 

/  9(n)U,c)A„(c)<ic 


One  obtains 

On 

Rn(V\,  A)  =  j  Rn(*x,Gn,c)\(c)dc  (3.8) 

-On 

On 

=  J  J^„+^„)-9(c)fdG^M)X„(c)dc=I1+I2+h, 

-0„  Xn 


7 


where 


a r. 

h  =  J  \n{c)dcj  (*n-0(c))2dGinl(x) 

~an 

an 

=  J  (n-1<72(c)  +  B2(c))Xn(c)dc, 

~an 

an 

12  =  J  n  fi2n(*n)dF(n\x)  J  5(n)(x,c)An(c)<fc 

Q  n 

(  /  S<"W)(%)  -  *»)A»(c)<fc)2 

=  L  — - - - rfFl”’(£). 

/  c)A„(c)<ic 

~«n 

/3  =  2  J  fin(*n)dF(n\x)  J (tfn  -  0(c))<7<n>(:r,c)An(c)c/c 


=  -2J2. 


Thus 


i2n(*A,A)  =  h-I2. 


One  then  obtains  further  that 


0  -  /  -  TO ” B(t»))^’(£>A*<c>iI 


—  I2  + 14  + 1$  > 


where 


(3.10) 


'<=/  (^j-Bw)J  '"(c)dc 

-«n 

On 

A  =  -2  J  v„(*„)dFM(*)  f  /"Hi, c)(n-'XM-B(c)\4c))dc 

-an 

°n  ,  v 

=  -2  J  r„(*„)dF<">(z)  J  (-n-‘  d!!"d{cX'C)  -  9(”>(x, c)B(c))Xn(c)dc 

On 

On 

=  -2  J  Hn(*n)  J  9{n\x,c)(6(c)-*n)\n(c)dcdF(n)(x) 


(3.11) 


(3.12) 


where  integration  by  parts  and  relation  (3.7)  were  used  to  obtain  correspondingly  the 
second  and  third  equalities.  Thus  J4  >  /2  and  (3.8)-(3.12)  result  in 


i2n(^A,A)>/1-74  = 


fln  2 

J  («.(*.,  G„,«)  -  (^j  -  B(c))  )A„(c)<fc  = 


-In 


/  («' V(c)A.(<0  +  (2n-Ai(c)B(c)  -  n'2^p))dc 


wherefrom  the  theorem  follows. 

Our  next  goal  is  two-fold.  First  it  will  be  shown  by  the  use  of  Theorem  3.1  that  any 
rate  of  the  higher  order  improvement  of  first  order  asymptotically  optimal  estimators  may 
be  anticipated  ranging  from  (log  n)°n“2  to  (log  log...  log  k  =  1,2,  ...,a  > 

k 

0,  for  approximately  integrable  models.  Second  a  close  resemblance  will  be  exhibited 
between  next  order  optimal  rates  of  improvement  for  such  estimators  and  optimal  rates  of 
estimability  of  some  non-regular  functionals,  the  common  ground  for  a  combined  treatment 
of  these  rather  different  problems  being  furnished  by  the  notion  of  a.i.l.  models. 


4.  A.i.l.  Models:  First  Applications 


Without  loss  of  generality  we  can  restrict  ourselves,  within  the  scope  of  the  paper,  to 
estimating  the  simplest  function 

*(F)=  /  xdF. 

Jr 1 

As  is  well  known  the  tail  behavior  of  the  distributions  F  6  F  is  of  primary  importance  in 
assessing  the  asymptotic  properties  of  the  sample  mean 


We  will  proceed  examining  the  crucial  role  of  this  tail  behavior  in  assessing  the  best  rates 
of  improving  upon  Xn ■  In  particular  the  quantity 


x2dF 


will  matter  as  may  be  inferred  from  the  next  lemma. 
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Lemma  4.1.  Assume  that  for  a  given  space  of  distributions  (F,  T)  there  exist  a)  a 
continuous  decreasing  to  zero  function  £(o),  v  >  0,  a  positive  7  and  for  any  F  €  F  such 
71(F),  72(F),  73(F),  0  <  71(F)  <  72(F)  <  7,  73(-)  locally  bounded  in  F,  that 

T.WW  s  (fV)  <  ”  >  7>(n  (4.1) 

and  b)  for  any  F  €  F  and  any  its  vicinity  V  C  F,  a  6  =  <5(F,  F)  >  0  exists,  such  that  the 
family  of  distributions  Gc  of  the  form 

=  exp{cx(,,)  -  6(c)},  \c\  <  6v~l 

belongs  to  V  for  all  sufficiently  large  1/,  where 

x{v)  =  x  l(M<I/)(x). 


Define  v  =  vn  by  the  relation 


Let  v'n  be  any  sequence  such  that  for  n  — ►  00 


=  l  +  o(l), 

•/n 


Si) 


=  1  +  0(1). 


(4.2) 


(4.3) 


Denote  V’n(^)  =  x^^  and  let 


#„  = 

U  i=l 


(4.3') 


Then  1)  Xn  is  exactly  (^-)2-rate  improvable  and  2)  locally  uniformly  in  F, 


*.(*.,  F)  <&-  +  0(1)),  „  oo. 


Proof.  Let 

=  &.(»)■ 

With  notations  (3.0),  (3.0')  one  obtains 


ifln.H  <  Kr'&V.). 


(4.4) 
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(4.5) 


A„,f  =  4  -  4,f  =  #>(0  -  (2 *(F)  -  Bn,F)Bn,F 

=  f(0(1  +  (>(1)).  (n  — ►  oo) 

locally  uniformly  in  F,  and 

42)K,c)<e2642)(0.  (4.6) 

Now  applying  Theorem  3.1  with  0„(x)  =  x("4  an  =  6(i/J,)-1,  An(c)  =  A(anc)  where 
A(-)  6  ©(1),  A i  =  /  |A'(c)|dc,  A2  =  /  4(c)*-  ^c’  ^1  -  ^2  <  00,  one  obtains  from  (3.5), 
using  (4.2)-(4.6), 

inf  sup(i?„(^n,  F)  -  > 

F£V 

-n-1  sup  42)(I/;,c)(1+6-1A1)(1+o(1))-A2^-2(^-)2  > 

|c|<an  n 

>  -n->e2S2(FX(^)(  1  +  <-U,Xl  +o(l))  -  A2<r2(!A)2  = 

=  —  ^ "(1  +  ^  l\i)  +  X2S  2^  (— )2(1  +  o(l)),  (n  -+  00), 

proving  assertion  1). 

Applying  once  again  (4.2)-(4.6)  one  obtains  further  locally  uniformly  in  F: 
Fn(4'n,F)  =  n-14F  +  J32iF 

<n  1  u2f  +  {—Tl  1^F\t/,n)  +  (v'n)  Z(^F)(I/n))2)(l  +  0(1)) 

<  "-‘4  - -^«Fl(Od  +  "(D) 

<  "-‘4  -  ^eKK1  + -Hi)) 

..-1_2  7l(-F)  f  vn  ^2/1  , 

=  fl  (1+0(1)), 

proving  ascertion  2). 

We  present  next  a  few  examples  in  which  Lemma  4.1  can  be  effectively  used  to  define 

„  n 

rates  of  improvement  of  the  sample  mean  Xn  =  n-1  Xi.  Notice  that  in  examples  1-3 

1=1 

Xn  is  a  first  order  asymptotically  optimal  estimator  of  ^(F)  =  EFX  due  to  Theorem  2.1. 
Below  F(x)  stands  for  1  —  F(x)  +  F(— x),  (x  >  0). 
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Example  1.  For  given  a,/?, 7  and  a  real  p  consider  the  class  J-  of  distribution  functions 
F(x),  ( x  €  R )  such  that  for  some  71(F),  72(F),  0  <  71(F)  <  72(F)  <  7  and  locally 
bounded  function  73(F) 

71(F)  <  (xMe-/?I<*)-1F(x)  <  72(F),  x  >  73(F). 

Let 

/  1  -  o  .  M  ,  ,  0  log  ^1 

vn  =  -  iog  27«  +  -  log  log  2773 - )• 

pa  a 

and  be  defined  as  in  (4.3'). 


Proposition  4.1  a)  Xn  is  exactly  ft  °  -rate  improvable  on  T  and  b)  locally  uniformly 
in  F  €  F 

a2{F)  71(F)  (log  n)° 


Rn(*n,F)< 


n 


47  n2 


-(l  +  o(l)),  n  — ►  00. 


Notice  that  the  smaller  is  a,  i.e.  the  heavier  are  the  tails  of  F,  the  larger  is  the 
improvement  rate  of  the  sample  mean. 


Proof.  Using  the  relations 

OO  OO 

£^(1/)  =  J  x2dF(x )  =  2  J  x  F(x)dx  +  v2F(v) 


(4.7) 


and 

OO 

/  x^e-^dx  =  +  0(1)),  („  00) 

J  aP 

one  readily  verifies  the  relation  (4.1)  with  £(i/)  =  .  Clearly  the  assumption 

b)  of  Lemma  4.1  holds  with  some  6,  0  <  6  <  |  log  Thus  the  proposition  follows 

immediately  from  Lemma  4.1. 

Example  2.  Assume  that  for  some  a  >  2,  7  >  0 

T  —  {F|7i(F)  <  F(x)x“  <  72(F),  x>  73(F)} 
where  0  <  71(F)  <  72(F)  <  7,  73(F)  is  locally  bounded. 
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Let 


a  —  2 


and  'I'n  be  defined  as  above  in  (4.3'). 


Proposition  4.2  a)  .Y„  is  exactly  n  <»  -rate  improvable  on  F  and  b)  locally  uniformly 
in  F 


Rni^niF)  < 


°HF)  t  a  ,z  71(F) 


_  m~a-2)  a-2n  “  (1  +  o(l)),  (n  -*  oc). 

Ti  a  4  &  2  j  a 


Notice  that  again  the  smaller  is  a  the  higher  is  the  improvement  rate  of  the  sample 


mean. 


Proof.  By  (4.7)  the  relation  (4.1)  holds  with  £(i/)  =  a.  Thus  the  Proposition  4.2  is 

implied  again  by  Lemma  4.1  along  the  argument  already  used  in  proving  Proposition  4.1. 

Denote 

log*  x  =  log  log ...  log  i 
*  times 

Example  3.  Assume  that  for  some  a  >  1,  k  =  1,2, ... 

*-i 

<  72(f).  *  >  73(f)) 

1=1 

for  some  0  <  71(F)  <  72(F)  <  00  and  a  locally  bounded  73(F). 

The  example  exhibits  the  following  peculiar  properties.  First  the  attainable  rate  of 
improvement  of  Xn  is  very  high,  namely  ((logfc  n)Q-1n)-1,  which  is  practically  compa¬ 
rable  to  the  order  n-1  of  the  leading  term  of  the  risk  R„(Xn,F )  for  most  sample  sizes. 
This  apparently  suggests  that  in  a  still  larger  class  of  nonparametric  problems  the  first 
order  asymptotic  optimality  of  a  given  estimator  cannot  be  taken  as  a  guard  against  its 
improvability  in  some  reasonable  applications  by  appealing  to  higher  order  properties. 

Second  in  distinction  to  the  former  examples  1,  2  the  improving  estimator  we  present 
below  is  even  second  order  unimprovable,  or  second  order  admissible.  This  sort  of  conclu¬ 
sion,  which  can  be  drawn,  with  the  help  of  Theorem  3.1,  whenever  the  bias  and  variance 
terms  don’t  match  each  other,  doesn’t  seem  to  be  excessive,  whence  the  higher  order  terms 
of  the  risk  expansion  fall  close  to  the  leading  one. 
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Let 

Proposition  4.3  a)  X„  is  exactly  ((logt  n)°-1n)-1-rate  improvable  on  T\  b)  locally 
uniformly  in  F 

Rn(*n,F)  <  n-J4  -  27i(F)((logi  +  o(l)),  (n  -»  oo) 

and  c)  $>n  is  second  order  admissible,  or  ((logt  n)“-1n)-1-rate  unimprovable  on  T. 


Proof.  It  follows  from  (4.7)  that  locally  uniformly  in  F 
27i(F)(log*  i/)1-“(l  +  o(l))  <  42>(1/)  ^  272(F)(logfc  i/)1-“(l  +  o(l)),  (v  -*  oo).  (4.8) 


Using  relations 


oo  oo 

\Bn,F\  <  $\v)  =  J  xdF(x)  =  I  F(x)dx  +  vF(v) 


one  obtains  similarly 

'  272(F)(i/logi/)"1(l  +  o(l)),  k>  1, 

<  < 

k  272(F)(i/(logi/)°)  J(H-o(l)),  *=1. 

Thus  (3.1"),  (4.5)  result  in  the  following: 

R*(K,F)  =  n-'al  -  +  oO))  +  Bl,F 

<  n~xo\  -  2f,(F)((logi  v/n)““1n)-'(l  +  o(l)), 
proving  the  second  assertion  of  the  proposition. 


(4.9) 


(4.10) 


To  prove  the  first  and  last  statements  notice  that  for  any  non-void  vicinity  V  of 
F  there  exists  6  >  0  such  that  the  family  G„)C  defined  by  (3.1)  with  tpn(x)  = 

|c|  <  a„  =  £(i/J,)-1,  belongs  to  V.  Now  using  the  inequality  (3.3)  with  An(c)  as  in 
Lemma  4.1  one  obtains  from  (4.9) 


inf  sup(F„(^„,F)  -  Rn(4fn,F))  > 


>  - 


sup  sup  |S„,f(c)|- 

|c|<«b  a"n  |c|<«„ 


^2 

(«»  n? 


f  0(n log  n)  *,  k  >  1 
\  O(n(log  n)®)-1,  *  =  1 


=  o(n(logtn)a  *)  \n 


oo. 


(4.11) 
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Notice  that  the  logarithmic  term  incorporated  in  v'n  is  essential  only  in  deriving  the 
lower  bound  (4.11),  while  a  simpler  estimator  4>n  with  t/>„(z)  =  satisfies  both  the 

assertions  b),  c)  of  Proposition  4.3. 

So  far  we  have  analyzed  higher  order  asymptotic  properties  Xn  under  progressively 
heavier  tail  behavior  of  the  underlying  distribution  F  €  7.  It  is  all  but  natural  to  inquire 
further  what  happens  with  this  estimator  while  F  ranges  over  the  class 

7  =  {F:  7i (F)  <  x2F(x)  <  72 (F),  x  >  73(F)} 


where  0  <  71(F)  <  72(F)  <  00  and  73(F)  is  locally  bounded. 

Notice  that  Xn  is  no  longer  first  order  asymptotically  optimal  or  even  risk  finite  in 
that  case.  Still  Theorem  3.1  allows  us  to  arrive  at  a  meaningful  result  and  moreover  is 
exhibiting  a  new  kind  of  phenomena.  We  shall  see  that  there  still  exists  an  asymptotically 
optimal  estimator  4,n  of  the  mean  EfX  which  however  is  in  that  case  only  log  (n)/n-rate 
consistent  and  moreover  the  normalized  risk  loJ*  nRn(’i>n,F)  does  not  need  to  converge. 

Define 

V 

V f\u)  =  J  x2dF(x) 

0 

and  let 

vn  =  y/n,  xj>n(x)  =  x(Vn\  ^  '^TrpniXi) 

1  1=1 


Proposition  4.4.  a)  The  functional  ’i'(F)  =  EfX  is  exactly  log  (n)/n-rate  estimable  on 
F;  b) 


n 


locally  vmiform  in  F  and  c)  4'„  is  first  order  asymptotically  optimal  and  exactly  n  1-rate 
improvable  on  T. 


Proof.  The  inequality  (4.9)  when  applied  to  F  G  7  gives  locally  uniformly  in  F 


<  272(F) 


\Bn,r\  <  +  o(l)). 


On  the  other  hand 

If  If 

ri^\u)  =  J x2dF  =  J  2 xF(x)dx  -  v2F{v)  <  272(F) log  t/(l  +  o(l)),  v 
0  0 


00. 
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Thus 


*»(*»,  F)  =  n'1  Var  .Y(t/)  +  B\  f  <  +  B2n  F  < 

272(F)  log  n 


< 


n 


-(l+o(l)). 


Now  the  same  argument  leading  to  (4.11)  applies  with  an  =  8vn  1  and 
SUP  B1,f(c)  + 


2  1  1  sup  |B„,f(c)|  +  7— l—  =  0(n  *) 


c  <ar 


ann  |c| <a„ 


(vnn)2 


implying  that  is  at  most  n  -rate  improvable.  That  it  is  indeed  that  rate  improvable 

n  (g  V 

can  be  easily  demonstrated  by  considering  the  estimator  n-1  X}  Vn  with  8  sufficiently 

1=1 

small. 


Remark  4.1.  The  way  we  have  defined  the  class  F  is  essential  for  the  crucial  assumption 
AI  of  Theorem  3.1  to  be  fulfilled  while  the  very  definition  of  F  allows  for  the  oscillations 
in  the  normed  risk  behavior  of  '&n- 

Proceeding  further  with  heavy  tailed  distributions  F  one  is  led  to  considering  the 
nonregular  linear  functionals  still  covered  by  Theorem  3.1  which  will  allow  optimal  rates 
conclusions  to  be  derived  for  such  functionals. 

Example  5.  Let,  for  some  a,  7, 1  <  a  <  2,  0  <  7  <  00, 

F  =  (F|7i(F)  <  x~aF(x)  <  72(F),  x  >  73(F)} 

where  0  <  71(F)  <  72(F)  <  7,  73(F)  being  locally  bounded. 

Define 

ibn(x)  =  min(|x|,  u)  sign  x 

,  (4.12) 

=  VarF*n(JY) 

with  v  =  vn  — y  00  to  be  defined  below. 

In  asserting  lower  bounds  in  this  and  the  next  examples  we  use  the  following  lemma. 

Lemma  4.2.  Let  Vc/bea  vicinity  of  a  given  F  €  F  with  '£(•)  bounded  on  V  and  the 
family  Gn,c  be  defined  by  (3.1),  (4.12).  Assume  that  Gn, c  €  V,  |c|  <  for  some  8  >  0. 

Let  An(c)  =  A(£_1i/nc),  where  A  £  <^(1)  is  a  symmetric  density  with  A2  =  f  ^7^^  dc  <  00. 
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Then 


inf  sup (Rn(*n,F)-a\F)> 

Fev 

oo  1  2 

>  -j 'f  j ~F(x)dx  J(e“  -e-‘‘)\'(c)dc( l+o(l))-^,  n  ^  oo. 

I'n  0 


Proof.  For  a  fixed  8  and  |c|  <  8vn  1  one  obtains 

eMc)  _  £'FecV'n(A’)  _  EF(1  4-  C1pn(X)  -I-  o(cV’n(^’))2)  = 

=  1  +  O^n1)  =  1  +  o(l),  n -+  oo. 

Next  with  Bn,F(c)  =  EGn,c(^n(X)  -  X)  integration  by  parts  results  in  the  following 
relations 

OO 

Bn,F{c)  =  J  zd(l  —  Gn)C{x)  —  Gn,c(  —  X))  +  ^(1  —  GnfC(v)  —  Gn,c{  —  v)) 

V 

oo 

=  —  ^(1  —  Gn,c(£)  ~~  GniC(~x))dx 

V 

oo 

=  -e"Mc)  J (eCI/(l  -  F(x))  -  e“Ci/F(-x))da: 

oo 

=  -(l  +  o(l))  / (e"(l  -  Hi))  -  e-eT(-x))dr. 


(4.13) 


Thus 


so  that 


OO 

Bn,F(c)  —  B„,f(— c)  =  —  (1  +  o(l))(cc*'  —  e  CI/)  J  F(x)dx 


Sv~ 


f  \'n(c)BniF(c)dc  =  f  \'n(c)(B„iF(c)  —  BnfF(—c))dc  — 


OO  1 

=  -^J  F{x)dx  J (ecS  -  e~c6)X(c)dc(l  +  o(  1)),  n 


oo,  u  —y  oo. 


Hence  using  the  inequality  (3.4)  of  Theorem  3.1  with  An  as  specified  gives  the  result  in 
question. 
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(4.13') 


Consider  next  an  estimator  of  'k(.F)  =  EpX  of  the  form 

n 

*n  =  rr1  ]►>„(*<) 

1=1 

where  is  defined  through  (4.12)  with 

un  =  pn*,  p>  0. 

Proposition  4.5.  a)  'if(F)  is  exactly  — --rate  estimable  on  F  and  b)  locally  uniformly 

in  F 


(?^i)2',!li"“>)n2Ii^(i+o(i))’ n 


Proof.  Let  V  be  a  non-empty  vicinity  in  F,  F  €  V.  Using  the  family  Gn<c  as  in  Lemma  4.2 
it  is  easy  to  check  that  Gn,e  €  V,  for  |c|  <  Si/'1 ,  and  sufficiently  small  6  >  0.  Thus  by 
Lemma  4.2 

inf  sup  *.(*..  F)  >  y  >,(  1  +  o(l)) -§£  = 

♦n  Fgv  c(a  —  l)n  b*nz 


where 


(  7l(F)\3  \2\  2  ,  ( -\  w 

=  \6(a  -  l)p»  ~Ti)P  "  (1  +  0<1))’ 

1 

A 3  =  -  /  A'(c)(e‘*  -  e-c‘)dc 


(4.14) 


can  be  made  positive  by  a  proper  choice  of  A(-)  e.g.  by  making  A'(c)  negative  for  0  <  c  <  1. 
Choose  further  p  small  enough  to  make  the  bound  positive  ensuring  the  lower  rate  bound 
as  stated. 

To  prove  the  last  statement  one  obtains 

V 

<• If  <  EF4>i(X)  =  J  x’dF  +  *’/>)  = 

0 

v 

=  2  J  xF(x)dx  <  y^^'°(l  +  o(l))  (4.15) 

o 

and  along  the  lines  of  (4.13) 

oo 

|S„,f|  =  \ErMX)  -  *(F)|  <  j  F(x)dx  <  M^„>-”(1  +  0(1)).  (4.16) 
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Thus  with  v  =  |/n  =  pni,  p  >  0 


R„(*„,F)  =  n~'clF  +  BlF< 

<  / MfVZ u.  '■ 

-  (  2-a  + 


n  »  (l+o(l)) 


locally  uniformly  in  F. 


A  slightly  different  upper  bound  would  result  for  the  estimator  ^nl  =  n_1  ^  A", 


(Vn) 


t=i 


3(1  — o) 

xn  <*  (l  +  o(l)). 


Example  6.  Let  for  some  integer  k  >  1  and  given  a  >  1,  7  >  0 

k-l 

F  =  {F|7i(F)  <  II  l°g<  ^X^gt  X)°HX)  <  72 (F),  x  >  73(F)} 
1=1 

with  some  0  <  71(F)  <  72(F)  <  7,  73(F)  being  locally  bounded. 


Proposition  4.6.  The  functional  (F)  =  EfX  is  exactly  (log*  n)2*1  °)-rate  estimable 
on  T  and  b)  the  estimator  (4.13'),  (4.12)  with  1 /  =  un  =  n  satisfies  the  relation 

*.(*., f)  <  7l(f)(loetn)!<1-">(l  +0(1)) 


locally  uniformly  in  F. 


Proof.  Applying  Lemma  4.2  in  the  same  manner  as  in  Proposition  4.5  with 

v  =  vn  =  pn( log*  n)1_a 


one  obtains  for  an  arbitrary  vicinity  V  €  F,  F  6  V, 

inf  sup  Rn(Vn,F)  > 

/  27i(F)pA1(l  -f  o(l)) _ W__\ 

\6(log*  pn)°-1(log*  n)°_1  62(log*  n)2*0-1)  / 
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with  a  positive  8  and  Ai  >  0.  For  a  sufficiently  small  p  this  gives  a  lower  bound 


inf  sup  -R„(4>„)f)  >  c(V)(logfc  n)2(1  a) 

F€  V 


with  a  positive  constant  c(V). 

Now  for  the  estimator  (4.15),  (4.16)  give  with  u  =  n: 


k-1 

an,F  —  2  J  xF(x)dx  <  272(F)  /(Ji^.rw.r-^i+od)) 
0  yr  *=1 

f  272(F)(log  i/)~1^(l  +  0(1)),  fc  >  1  /  x 
-  \  272(f;,)(log  I/)-“f(1  +  o(1)),  k  =  l'KV  OC) 


and 

OO 

|Bn,f|<  lF(x)ix<-n(F)(  log^)'-“(l+o(l)),  I/-OO, 

V 

wherefrom  the  statement  b)  follows. 

The  example  just  considered  appears  to  be  instructive  in  several  aspects.  First  it 
exhibits  an  estimator  with  an  extremely  slow,  though  best  attainable,  speed  of  convergence. 
Next  it  differs  from  the  previous  ones  (as  well  as  many  other  estimation  problems)  in  that 
the  risk  of  the  best  convergence  rate  estimator  is  mainly  contributed  by  the  bias  rather 
than  the  variance  term.  Notice  that  just  as  in  the  two  previous  examples  there  exists  an 
estimator  with  quadratic  risk  tending  to  zero  at  the  best  rate  though  the  sample  mean 
clearly  has  no  even  finite  second  order  moments. 

The  examples  1-6  feature  the  sort  of  results  one  can  arrive  at  with  the  introduced 
notion  of  a.i.l.  modes.  Further  applications  to  a  wider  class  of  functionals  will  be  presented 
elsewhere. 
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